Skip to content

Improve task directory structure, design system integration, and TDD guidance #2

Merged
etr merged 4 commits intoetr:mainfrom
codingarchitect-wq:improve-tasks-dir-structure
Mar 14, 2026
Merged

Improve task directory structure, design system integration, and TDD guidance #2
etr merged 4 commits intoetr:mainfrom
codingarchitect-wq:improve-tasks-dir-structure

Conversation

@codingarchitect-wq
Copy link
Contributor

Summary

  • Tasks directory restructure: Changes task storage from a single specs/tasks.md file to a specs/tasks/ directory with per-milestone
    subdirectories (M1-core-auth/, M2-upload/), individual TASK-NNN.md files, a centralized _index.md with a status table, and a parking-lot.md
    for deferred tasks. Updates tasks, next-task, execute-task, and task-validation-loop skills to work with the new structure.
  • Design system & spec context loading: build-unplanned-feature, execute-task, and product-design skills now load existing specs
    (architecture, design system, PRD) before planning, so implementations follow established patterns and contradictions are caught early.
  • TDD anti-patterns consolidated: Inlines the separate testing-anti-patterns.md reference into the main TDD skill with concrete code examples
    and decision gates. Removes the standalone file.
  • Validation loop TDD enforcement: Behavioral fixes discovered during validation must now follow TDD (write failing test first), while cosmetic
    fixes can be applied directly.
  • Next-task optimization: next-task now reads only _index.md status table instead of aggregating all task files.
  • Product design flow: Adds "Create tasks" as a next-step option after PRD updates.

Copy link
Owner

@etr etr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! There is a lot of goodness here. Just one minor comment on the split of the task file.

Iterate until user approves, then write to `specs/tasks.md`.
Iterate until user approves, then write task files to `specs/tasks/`.

**Output structure:**
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you notice any reduction in quality by splitting the tasks? In the past, I've noticed the agent exploring the tasks file to look for future tasks to understand whether it was the right/wrong time to do something.

Not sure if that less/more/equally effective in multiple files - I don't have evidence either way, but curious if you had done some testing on it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The next is written entirely by me, no AI involved :)

First of all: congratulations for the great plugin. It matches my workflow and quality requirements the best so far. The TDD and the Design System approaches are really great.

I have a "benchmark" setup where I evaluated stock claude code, GSD, superpowers, groundwork and my own variation named Forge with "best" feature of each (I still have on my list to evaluate: Paul, agent-os, BMAD-Method, claude-conductor, OpenSpec). So far your plugin gave me the best results but there are some things that I believe can be improved.

The main issues I am trying to solve are:

  • context management for the orchestrator: growing context is probably the biggest factor that seems to affect output quality, therefore I try to keep it as small as possible:

    • since tasks can grow unbounded, splitting tasks in separate files and keeping a status table in the _index.md. this way only the relevant task is loaded into context and lookup is fast.
    • another idea is to have the plan agent not dump it's output back to the orchestrator, but rather work with plan files and the orchestrator just gets the file path. This is a bigger refactor since the execute-task skill validates the plan, so it would still read the file...would need to find another solution, either the plan agent validates the plan itself or maybe a plan-validator agent could to that. the goal is to have the orchestrator context slim
  • tests quality

    • this was always my biggest pain, claude code was not following TDD, cutting corners, lying that tests are passing when they were failing :D. Your plugin is amazingly good in enforcing TDD, but still the test quality is not that good. Probably resembles a lot the training data since not many projects have good tests :D
    • so I added more guidance regarding the good and bad testing practices.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The next is written entirely by me, no AI involved :)

So fun to live at a time where we have to say this :)

I have a "benchmark" setup where I evaluated stock claude code, GSD, superpowers, groundwork and my own variation named Forge with "best" feature of each (I still have on my list to evaluate: Paul, agent-os, BMAD-Method, claude-conductor, OpenSpec). So far your plugin gave me the best results but there are some things that I believe can be improved.

Thanks :). I am happy it is helping you and yeah, there are definitely a few things I think could benefit from improving.

since tasks can grow unbounded, splitting tasks in separate files and keeping a status table in the _index.md. this way only the relevant task is loaded into context and lookup is fast.

Got it - that makes sense and answers my question. The real struggle appears to constantly be "how to strike a balance between more information and context rot" - I think we will all have to fight this for a while, but your intuition of splitting data and using agents is what I believe is the right direction.

another idea is to have the plan agent not dump it's output back to the orchestrator, but rather work with plan files and the orchestrator just gets the file path. This is a bigger refactor since the execute-task skill validates the plan, so it would still read the file...would need to find another solution, either the plan agent validates the plan itself or maybe a plan-validator agent could to that. the goal is to have the orchestrator context slim

I think this is where I had started from and had to back away for some reason (my context, just like the agent's has rotten, I suspect), so exploring this direction might be a smart thing to do.

so I added more guidance regarding the good and bad testing practices.

Those are very helpful. I am testing locally a new agent to be added to the validation loop that checks specifically the quality of tests so I can give it a better understanding of what good versus bad tests are.

@etr etr merged commit e2fead2 into etr:main Mar 14, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants